209 research outputs found

    AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

    Get PDF
    The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning them to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores.Comment: Accepted to the 8th Workshop on Python for High-Performance and Scientific Computing (PyHPC 2018

    Towards Automatic Application Migration to Clouds

    Get PDF
    Porting applications to Clouds is one of the key challenges in software industry. The available approaches to perform this task are basically either services derived from alliances of major software vendors and Cloud providers focusing on their own products, or small platform providers focusing on the most popular software stacks. For migrating other types of software, the options are limited to infrastructure-as-a-Service (IaaS) solutions which require a lot of programming effort for adapting the software to a Cloud provider’s API. Moreover, if it must be deployed in different providers, new integration procedures must be designed and implemented which could be a nightmare. This paper presents a solution for facilitating the migration of any application to the cloud, inferring the most suitable deployment model for the application and automatically deploying it in the available Cloud providers

    Hyperparameter optimization using agents for large scale machine learning

    Get PDF
    Machine learning (ML) has become an essential tool for humans to get rational predictions in different aspects of their lives. Hyperparameter algorithms are a tool for creating better ML models. The hyperparameter algorithms are an iterative execution of trial sets. Usually, the trials tend to have a different execution time. In this paper we are optimizing the grid and random search with cross-validation from the Dislib [1] an ML library for distributed computing built on top of PyCOMPSs[2] programming model, inspired by the Maggy [3], an open-source framework based on Spark. This optimization will use agents and avoid the trials to wait for each other, achieving a speed-up of over x2.5 compared to the previous implementation

    A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one

    Full text link
    This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.Comment: Accepted in Future Generation Computer Systems (FGCS). Licensed under CC-BY-NC-N

    Extension of a task-based model to functional programming

    Get PDF
    Recently, efforts have been made to bring together the areas of high-performance computing (HPC) and massive data processing (Big Data). Traditional HPC frameworks, like COMPSs, are mostly task-based, while popular big-data environments, like Spark, are based on functional programming principles. The earlier are know for their good performance for regular, matrix-based computations; on the other hand, for fine-grained, data-parallel workloads, the later has often been considered more successful. In this paper we present our experience with the integration of some dataflow techniques into COMPSs, a task-based framework, in an effort to bring together the best aspects of both worlds. We present our API, called DDF, which provides a new data abstraction that addresses the challenges of integrating Big Data application scenarios into COMPSs. DDF has a functional-based interface, similar to many Data Science tools, that allows us to use dynamic evaluation to adapt the task execution in runtime. Besides the performance optimization it provides, the API facilitates the development of applications by experts in the application domain. In this paper we evaluate DDF's effectiveness by comparing the resulting programs to their original versions in COMPSs and Spark. The results show that DDF can improve COMPSs execution time and even outperform Spark in many use cases.This work was partially supported by CAPES, CNPq, Fapemig and NIC.BR, and by projects Atmosphere (H2020-EU.2.1.1 777154) and INCT-Cyber.Peer ReviewedPostprint (author's final draft

    Service Orchestration on a Heterogeneous Cloud Federation

    Get PDF
    During the last years, the cloud computing technology has emerged as a new way to obtain computing resources on demand in a very dynamic fashion and only paying for what you consume. Nowadays, there are several hosting providers which follow this approach, offering resources with different capabilities, prices and SLAs. Therefore, depending on the users' preferences and the application requirements, a resource provider can fit better with them than another one. In this paper, we present an architecture for federating clouds, aggregating resources from different providers, deciding which resources and providers are the best for the users' interests, and coordinating the application deployment in the selected resources giving to the user the impression that a single cloud is used

    Semantic resource allocation with historical data based predictions

    Get PDF
    One of the most important issues for Service Providers in Cloud Computing is delivering a good quality of service. This is achieved by means of the adaptation to a changing environment where different failures can occur during the execution of different services and tasks. Some of these failures can be predicted taking into account the information obtained from previous executions. The results of these predictions will help the schedulers to improve the allocation of resources to the different tasks. In this paper, we present a framework which uses semantically enhanced historical data for predicting the behavior of tasks and resources in the system, and allocating the resources according to these predictions

    A Study of Checkpointing in Large Scale Training of Deep Neural Networks

    Full text link
    Deep learning (DL) applications are increasingly being deployed on HPC systems, to leverage the massive parallelism and computing power of those systems for DL model training. While significant effort has been put to facilitate distributed training by DL frameworks, fault tolerance has been largely ignored. In this work, we evaluate checkpoint-restart, a common fault tolerance technique in HPC workloads. We perform experiments with three state-of-the-art DL frameworks common in HPC Chainer, PyTorch, and TensorFlow). We evaluate the computational cost of checkpointing, file formats and file sizes, the impact of scale, and deterministic checkpointing. Our evaluation shows some critical differences in checkpoint mechanisms and exposes several bottlenecks in existing checkpointing implementations. We provide discussion points that can aid users in selecting a fault-tolerant framework to use in HPC. We also provide takeaway points that framework developers can use to facilitate better checkpointing of DL workloads in HPC

    Enabling System Wide Shared Memory for Performance Improvement in PyCOMPSs Applications

    Get PDF
    Python has been gaining some traction for years in the world of scientific applications. However, the high-level abstraction it provides may not allow the developer to use the machines to their peak performance. To address this, multiple strategies, sometimes complementary, have been developed to enrich the software ecosystem either by relying on additional libraries dedicated to efficient computation (e.g., NumPy) or by providing a framework to better use HPC scale infrastructures (e.g., PyCOMPSs).In this paper, we present a Python extension based on SharedArray that enables the support of system-provided shared memory and its integration into the PyCOMPSs programming model as an example of integration to a complex Python environment. We also evaluate the impact such a tool may have on performance in two types of distributed execution-flows, one for linear algebra with a blocked matrix multiplication application and the other in the context of data-clustering with a k-means application. We show that with very little modification of the original decorator (3 lines of code to be modified) of the task-based application the gain in performance can rise above 40% for tasks relying heavily on data reuse on a distributed environment, especially when loading the data is prominent in the execution time.This work was partly funded by the EXPERTISE project (http://www.msca-expertise.eu/), which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 721865. BSC authors have also been supported by the Spanish Government through contracts SEV2015-0493 and TIN2015-65316-P, and by Generalitat de Catalunya through contract 2014-SGR-1051.Peer ReviewedPostprint (author's final draft
    corecore